-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: hot ranges api #74377
server: hot ranges api #74377
Conversation
e8a9be8
to
33ea61a
Compare
cc @abarganier |
fb968cd
to
c197dab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @koorosh)
pkg/server/serverpb/status.proto, line 1721 at r1 (raw file):
} rpc HotRanges2(HotRangesRequest) returns (HotRangesResponseV2) {
Can we reconsider the placement of this API call? We already have an /api/v2/
server that new stuff should go under and I'd prefer to use that if we can.
See https://github.com/cockroachdb/cockroach/blob/master/pkg/server/api_v2.go
c197dab
to
5cf487c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/serverpb/status.proto, line 1721 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Can we reconsider the placement of this API call? We already have an
/api/v2/
server that new stuff should go under and I'd prefer to use that if we can.See https://github.com/cockroachdb/cockroach/blob/master/pkg/server/api_v2.go
Done. I updated /api/v2/ranges/hot
api to provide updated version of Hot ranges, but I would prefer to keep /_status/v2/hotranges
as well until /api/v2/ranges/hot
handles authorization validation (for secure and insecure modes).
5cce471
to
76d43e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! It's great to see this coming together.
One general question I wanted to pose here is have we considered how this might work in a serverless context? With the upcoming project to improve tenant-level observability, we'd ideally like to be able to generate a hot ranges report for a specific serverless tenant. We're hoping to build some level of debug.zip functionality at the tenant level, and one of the pieces that we think will be useful for such a bundle is a hot ranges report, since this is relevant within the SQL domain.
This is not necessarily in scope for this PR, but we should think about it now instead of later to avoid having to build a HotRangesV3
endpoint a few months from now 😅. What would we need to do to make this work at the tenant-level?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @koorosh)
pkg/server/status.go, line 2073 at r4 (raw file):
} dbNames := make(map[uint32]string)
Can we possibly make a struct
type to more clearly keep track of all this information?
It seems like the key for all these maps is the catalog.Descriptor
ID, so instead of having multiple maps, can we have a single struct type that encompasses all these attributes (db names, table names, index names, and parent ID), and then just use a single map? Something like
type HotRangeReportMeta struct {
dbName string
tableName string
indexNames map[uint32]string
parentId int32
}
rangeReportMetas := make(map[uint32]HotRangeReportMeta)
...
for _, desc := range descrs {
rangeReportMetas[uint32(desc.GetId())] = HotRangeReportMeta{
...
}
}
Then the maintainer can get all the info they need with a single map access. Just an idea, let me know what you think. We could also potentially define functions on this type to encapsulate some of the more complex access patterns of this data, although maybe that's overkill :D
Code quote:
dbNames := make(map[uint32]string)
tableNames := make(map[uint32]string)
indexNames := make(map[uint32]map[uint32]string)
parents := make(map[uint32]uint32)
pkg/server/status.go, line 2106 at r4 (raw file):
var ranges []*serverpb.HotRangesResponseV2_HotRange // TODO (koorosh): how to flatten triple nested loop?
Nothing stands out to me that would allow us to avoid this, I don't see it as too big of an issue, personally :)
Code quote:
// TODO (koorosh): how to flatten triple nested loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 8 of 8 files at r4, 5 of 5 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @koorosh)
-- commits, line 8 at r4:
nit: can you reword this a bit? It seems there are two changes being made with v2
: not exposing sensitive data and creating a flat structure. Please make that a bit more clear.
-- commits, line 16 at r5:
nit: use backticks for fields
-- commits, line 21 at r5:
nit: can you add a note about why this would lead to additional store scans?
-- commits, line 28 at r6:
nit: can you clarify what the implementation was before and after the change. It's unclear from this sentence.
docs/generated/swagger/spec.json, line 1779 at r6 (raw file):
}, "hotRangeInfo": { "description": "Hot range details",
This should be more detailed.
pkg/server/status.go, line 2065 at r4 (raw file):
} func (s *statusServer) HotRanges2(
nit: docstring
pkg/server/serverpb/status.proto, line 1721 at r1 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
Done. I updated
/api/v2/ranges/hot
api to provide updated version of Hot ranges, but I would prefer to keep/_status/v2/hotranges
as well until/api/v2/ranges/hot
handles authorization validation (for secure and insecure modes).
SGTM
pkg/server/serverpb/status.proto, line 1126 at r5 (raw file):
message HotRangesResponseV2 { message HotRange {
nit: docstring
b2dc529
to
a8d7458
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
Previously, dhartunian (David Hartunian) wrote…
nit: can you reword this a bit? It seems there are two changes being made with
v2
: not exposing sensitive data and creating a flat structure. Please make that a bit more clear.
Done.
Previously, dhartunian (David Hartunian) wrote…
nit: use backticks for fields
Done.
Previously, dhartunian (David Hartunian) wrote…
nit: can you add a note about why this would lead to additional store scans?
Done.
Previously, dhartunian (David Hartunian) wrote…
nit: can you clarify what the implementation was before and after the change. It's unclear from this sentence.
Done.
docs/generated/swagger/spec.json, line 1779 at r6 (raw file):
Previously, dhartunian (David Hartunian) wrote…
This should be more detailed.
Done.
pkg/server/status.go, line 2065 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: docstring
Done.
pkg/server/status.go, line 2073 at r4 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Can we possibly make a
struct
type to more clearly keep track of all this information?It seems like the key for all these maps is the
catalog.Descriptor
ID, so instead of having multiple maps, can we have a single struct type that encompasses all these attributes (db names, table names, index names, and parent ID), and then just use a single map? Something liketype HotRangeReportMeta struct { dbName string tableName string indexNames map[uint32]string parentId int32 } rangeReportMetas := make(map[uint32]HotRangeReportMeta) ... for _, desc := range descrs { rangeReportMetas[uint32(desc.GetId())] = HotRangeReportMeta{ ... } }
Then the maintainer can get all the info they need with a single map access. Just an idea, let me know what you think. We could also potentially define functions on this type to encapsulate some of the more complex access patterns of this data, although maybe that's overkill :D
Done.
pkg/server/status.go, line 2106 at r4 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Nothing stands out to me that would allow us to avoid this, I don't see it as too big of an issue, personally :)
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general question I wanted to pose here is have we considered how this might work in a serverless context?
Just noting - we (myself, @koorosh, @rimadeodhar) had a chat about this today. The effort involved to make this work in a serverless context will begin with our work to get a minimal version of debug.zip
working for serverless tenants. Since debug.zip uses the original HotRanges endpoint, which is also used by this new version, the work we do on the original endpoint should carry over capabilities to the new version as well.
The effort involved should be somewhat minimal - we just need to filter down the ranges/replicas reported on to only those with the appropriate tenant key prefix.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
91e7800
to
2149fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, I think this is almost ready. I just have a few docstring nits and an error condition that I think we should probably log.
There are also some test failures in the CI build - have you tried running make testrace
or make stressrace
locally? I've re-triggered them in case it was a spurious failure, but if not, let me know if I can help debug 👍
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @koorosh)
pkg/server/status.go, line 2101 at r26 (raw file):
_, tableID, err := s.sqlServer.execCfg.Codec.DecodeTablePrefix(r.Desc.StartKey.AsRawKey()) if err != nil { continue
nit: should we consider at least logging the error here?
Code quote:
continue
pkg/server/serverpb/status.proto, line 1071 at r26 (raw file):
// API: PUBLIC ALPHA double queries_per_second = 2; // LeaseholderNodeID indicates on Node ID that contains replica that is leaseholder
nit: // LeaseholderNodeID indicates the Node ID that is the current leaseholder for the given range.
Code quote:
// LeaseholderNodeID indicates on Node ID that contains replica that is leaseholder
pkg/server/serverpb/status.proto, line 1129 at r26 (raw file):
// HotRange message describes a single hot range, ie its QPS, node ID it belongs to, etc. message HotRange { // range_id indicates Range ID that's identified as hot range
nit: for this docstring and those below, let's make sure the sentences end with .
periods to stay consistent with the rest of the file
pkg/server/serverpb/status.proto, line 1134 at r26 (raw file):
(gogoproto.customname) = "RangeID" ]; // node_id indicates on node that contains current hot range
nit: // node_id indicates the node that contains the current hot range.
Code quote:
// node_id indicates on node that contains current hot range
pkg/server/serverpb/status.proto, line 1144 at r26 (raw file):
(gogoproto.customname) = "QPS" ]; // table_name indicates table which data is stored in this hot range
nit: // table_name indicates the SQL table that the range belongs to.
Code quote:
// table_name indicates table which data is stored in this hot range
pkg/server/serverpb/status.proto, line 1155 at r26 (raw file):
"github.com/cockroachdb/cockroach/pkg/roachpb.NodeID" ]; // leaseholder_node_id indicates on Node ID that contains replica that is a leaseholder
nit: // leaseholder_node_id indicates the Node ID that is the current leaseholder for the given range.
Code quote:
/ leaseholder_node_id indicates on Node ID that contains replica that is a leaseholder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out on this, I didn't run them. Will run locally and to check what's wrong
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/server/status.go, line 2101 at r26 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: should we consider at least logging the error here?
Done.
pkg/server/serverpb/status.proto, line 1071 at r26 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit:
// LeaseholderNodeID indicates the Node ID that is the current leaseholder for the given range.
Done.
pkg/server/serverpb/status.proto, line 1129 at r26 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: for this docstring and those below, let's make sure the sentences end with
.
periods to stay consistent with the rest of the file
Done.
pkg/server/serverpb/status.proto, line 1134 at r26 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit:
// node_id indicates the node that contains the current hot range.
Done.
pkg/server/serverpb/status.proto, line 1144 at r26 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit:
// table_name indicates the SQL table that the range belongs to.
Done.
pkg/server/serverpb/status.proto, line 1155 at r26 (raw file):
// leaseholder_node_id indicates the Node ID that is the current leaseholder for the given range.
Done.
e0f89de
to
8d27691
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we squash some of the commits in this PR? We shouldn't have any "nit" commits in history. I'd prefer if we just made this PR a single commit but I can understand if you want to break it up into 2 or 3. Right now it's too many.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
bors r+ |
LGTM |
Build failed (retrying...): |
Build failed: |
bors retry |
Build failed (retrying...): |
Build failed (retrying...): |
bors r- This checks in a |
Canceled. |
05d7a1a
to
82808af
Compare
This change implements second version of hot ranges api that's required for UI to represent enhanced Hot Ranges page. Before, Hot ranges (under Advanced debugging page) used former version of HotRanges api that provided information about hot ranges with internal/ sensitive data (see: cockroachdb#53212) that should not be exposed to users. Now, in addition to existing endpoint, additional one is implemented that is based on a former version and provides hot ranges information that is only needed for Hot Ranges page (range id, qps, table, db, and index names for particular range). The list of hot ranges and their QPS is provided by `HotRanges` service and then information like DB, table and index names are retrieved from range's `StartKey` that might include this info or not (in case if it's meta range, or range that stores index itself for instance). `HotRange` and `HotRangeV2` services expect the same request type as an argument but return different responses. `HotRangeV2` service returns a flat list of hot ranges instead of grouped ranges per node/store. Release note: None server: add leaseholder node id to hot ranges api Current change extends `statuspb.HotRangesResponse` to include `LeaseholderNodeID` field to indicate the node id that contains leaseholder replica for current hot range. This change was made in `localHotRanges` function (that is used by `HotRanges` that in turn used by `HotRangeV2` service) to reuse existing logic of iteration over the stores and querying hot ranges. It extends its response by `LeaseholderNodeID` field. Otherwise, the same logic should be implemented in `HotRangeV2` service by calling `VisitStores` iterator. Release note: None Release justification: bug fixes and low-risk updates to new functionality
Before, `listHotRanges` request handler of `apiV2Server` relied on `HotRanges` service that is now should be replaced by new `HotRangesV2` implementation. Current change reuses HotRangeV2 service in `api/v2/ranges/hot` api. It allows to share the same logic between REST and gRPC endpoints and gradually migrate to new version of API. Release note: None Release justification: bug fixes and low-risk updates to new functionality
Initially, `HotRangesV2` service in status server was defined to use GET method to handle HTTP request. It was convenient way to display response data in Advanced debugging page since it allowed to render response body right on to page. But now, hot ranges will be used on user facing page and request dispatching for hot ranges api should follow generic workflow: initialize `HotRangesRequest` protobuf message and dispatch request with `src/util/api` service. This restriction forces to use POST method (since GET method doesn't allow to provide request body). In addition, Hot Ranges debugging page is refactored to use `api` service. Release note: None Release justification: bug fixes and low-risk updates to new functionality
This change extends hot ranges API to support pagination of responses. It is also possible to avoid pagination when page size is set to 0. Release note: None Release justification: bug fixes and low-risk updates to new functionality
82808af
to
af62e80
Compare
bors r+ |
Build succeeded: |
This change implements second version of hot ranges api that's required
for UI to represent enhanced Hot Ranges view.
It reuses former implementation but doesn't expose sensitive internal data.
Instead, it provides flat structure of fields required for ui.
New
HotRangeV2
endpoint is established to:HotRange
endpointRelease note: None
Release justification: bug fixes and low-risk updates to new functionality